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USING THE BOOTSTRAP TO QUANTIFY THE AUTHORITY OF 
AN EMPIRICAL RANKING 

By Peter Hall and Hugh Miller 

University of Melbourne 

The bootstrap is a popular and convenient method for quantify- 
ing the authority of an empirical ordering of attributes, for example 
of a ranking of the performance of institutions or of the influence of 
genes on a response variable. In the first of these examples, the num- 
ber, p, of quantities being ordered is sometimes only moderate in size; 
in the second it can be very large, often much greater than sample 
size. However, we show that in both types of problem the conventional 
bootstrap can produce inconsistency. Moreover, the standard n-out- 
of-n bootstrap estimator of the distribution of an empirical rank may 
not converge in the usual sense; the estimator may converge in dis- 
tribution, but not in probability. Nevertheless, in many cases the 
bootstrap correctly identifies the support of the asymptotic distribu- 
tion of ranks. In some contemporary problems, bootstrap prediction 
intervals for ranks are particularly long, and in this context, we also 
quantify the accuracy of bootstrap methods, showing that the stan- 
dard bootstrap gets the order of magnitude of the interval right, but 
not the constant multiplier of interval length. The m-out-of-n boot- 
strap can improve performance and produce statistical consistency, 
but it requires empirical choice of m; we suggest a tuning solution 
to this problem. We show that in genomic examples, where it might 
be expected that the standard, "synchronous" bootstrap will suc- 
cessfully accommodate nonindependence of vector components, that 
approach can produce misleading results. An "independent compo- 
nent" bootstrap can overcome these difficulties, even in cases where 
components are not strictly independent. 

1. Introduction. The ordering of a sequence of random variables is of- 
ten a major aspect of contemporary statistical analyses. For example, data 
on the comparative performance of institutions (e.g., local governments, or 
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health providers, or universities) are frequently summarized by reporting 
the ranking of empirical values of a performance measure; and the relative 
influence of genes on a particular response is sometimes indicated by ranking 
the values of the weights that are applied to them after the application of a 
variable selector, such as the lasso. It is reasonable to argue that, especially 
in contentious situations, no ranking should be unaccompanied by a mea- 
sure of its authority [Goldstein and Spiegelhalter (1996)]. The bootstrap is 
a popular approach to developing such a measure. 

In this paper, we report on both the theoretical and the numerical prop- 
erties of bootstrap estimators of the distributions of rankings. We show that 
the standard n-out-of-n bootstrap generally fails to give consistency, and in 
fact may not produce distribution estimators that converge either almost 
surely or in probability. The m-out-of-n bootstrap overcomes these diffi- 
culties, but requires empirical choice of m. We suggest a tuning approach 
to solving this problem. This technique remains appropriate in cases where 
the number, p say, of populations is very large, although in that context, 
one could also regard m as a means of setting the level of sensitivity of the 
bootstrap to near-ties among ranks, rather than as a smoothing parameter. 

In some contemporary prediction problems, the empirical rank is quite 
highly variable. We develop mathematical models in this setting, and explore 
the validity of bootstrap methods there. In particular, we show that the 
inherent inconsistency of the standard n-out-of-n bootstrap does not prevent 
that method from correctly capturing the order of magnitude of the expected 
value of rank, or the expected length of prediction intervals, although it leads 
to errors in estimators of the constant multiplier of that order of magnitude. 

Another issue is that of adequately reflecting, in the bootstrap algorithm, 
dependence among the datasets representing the different populations, for 
example, data on the performances of different health providers, or on the 
expression levels of different genes. In examples of the first type, where 
different institutions are being ranked, the assumption of independence is 
often appropriate; it can usually be accommodated through conditioning. In 
such cases, resampling can be implemented in a way that explicitly reflects 
population-wise independence. 

However, in the genomic example, data on expression levels of different 
genes from the same individual are generally not independent. In this set- 
ting, using the standard nonparametric bootstrap to assess the authority of 
ranking would seem to be a good choice, since in more conventional prob- 
lems it captures well the dependence structure of data vectors. However, we 
show that, even when the number of variables being ranked is much less 
than sample size, the standard approach can give unreliable results in some 
problems. This is largely because knowing the composition of a resample for 
the jth population (e.g., for the jth gene, in the genomic example) identifies 



QUANTIFYING RANKING 



3 



exactly the resamples for other genes. Therefore, the resamples for different 
populations are hardly independent, even conditional on the original data. 

This has a variety of repercussions. For example, it implies that standard 
bootstrap probabilities, when computed conditional on the information we 
have in the resample about the jth gene, degenerate to indicator functions. 
Conditional inference is attractive in ranking problems, since it can lead 
to substantial reductions in variability. To overcome the problem, we sug- 
gest using an "independent component" version of the bootstrap, where the 
bootstrap is applied as though the ranked variables were statistically inde- 
pendent. This approach can be valid even in the case of nonindependence. 
(In order to make it clear that in this setting we use the term "standard 
bootstrap" to mean the resampling of p-vectors of data, we shall refer to 
this bootstrap method as the "synchronous" bootstrap; the standard boot- 
strap results in vector components being synchronised with one another in 
each resampling step.) 

It is possible to generalize our treatment to cases where several rankings 
are undertaken jointly, for example, where universities are ranked simulta- 
neously in terms of the quality of their graduate programs and the career 
prospects of their undergraduates. Our main conclusions about the relative 
merits of different bootstrap methods persist in this more general setting, 
although a detailed treatment of that case would be significantly longer and 
more complex. 

Work on the bootstrapping of statistics related to ranks includes that of 
Srivastava (1987), who introduced bootstrap methods for a class of ranking 
and slippage problems (although not directly related to the problems dis- 
cussed in this paper); Tu, Burdick and Mitchell (1992), who discussed boot- 
strap methods for canonical correlation analysis; Larocque and Leger (1994), 
Steland (1998) and Peilin et al. (2000), who developed bootstrap methods for 
quantities such as rank tests and rank statistics; Goldstein and Spiegelhalter 
(1996), who discussed bootstrap methods for constructing interval estimates; 
Langford and Leyland (1996), who addressed bootstrap methods for rank- 
ing the performance of doctors; Cesario and Barreto (2003), Hui, Modarres 
and Zheng (2005) and Taconeli and Barreto (2005), who discussed boot- 
strap methods for ranked set sampling; and Mukherjee et al. (2003), who 
developed methods for gene ranking using bootstrapped p-values. 

2. Model and methodology. 

2.1. Model. Assume we have datasets X\, X p drawn from populations 
111, . . . ,n p , respectively, and that for the jth population there is an associ- 
ated parameter 8j which measures, for example, the strength of an attribute 
in the population, or the performance of an individual or an organisation 
related to the population, or the esteem in which an institution or a program 
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is held. If the 6j's were known, then our ranking of the populations would 
be 

(2.1) rj = l + ^J(^>^) for j = l,...,p, 

say, signifying that rj is the rank of the jth population. Here, tied rankings 
can be considered to have been broken arbitrarily, for example at random. 

We wish to develop an empirical version of the ranking at (2.1). For this 
purpose, we compute from Xj an estimator 9j of 9j, for 1 < j < p, and we 
rank the populations in terms of the values of 9j. In particular, if we have 
9i,...,9p, then we write 

(2.2) f j = l + J2 I (^>0 j ) for j = l,...,p, 

Mi 

to indicate the empirical version of (2.1). Again, ties can be broken arbitrar- 
ily, although in the case of (2.2) the noise implicit in the estimators 0j often 
means that there are no exact ties. 

We shall treat two cases: "fixed p" and "large p" distinguished in the- 
oretical models by taking p fixed and allowing n to diverge, and by per- 
mitting p to diverge, respectively. Cases covered by the latter model in- 
clude instances where Xq is a set of p- vectors, say Xq = {X±, . . . ,X n } where 
Xi = {Xn , . . . , Xi p ) . There, Xj = {X\j , . . . , X n j} is the set of jth components 
of each data vector, and in particular each Xj is of the same size. This ex- 
ample arises frequently in contemporary problems in genomics, where X{ is 
the vector of expression-level data on perhaps p = 5000 to 20,000 genes for 
the ith individual in a population. In such cases, n can be relatively small, 
for example, between 20 and 200. The vectors Xi can generally be regarded 
as independent, but not so the components X±, . . . , X p . However, as we shall 
argue in Section 4.1, there may be advantages in conducting inference as 
though the components were independent, even when that assumption is 
incorrect. 

2.2. Basic bootstrap methodology. The authority of the ranking at (2.2), 
as an approximation to that at (2.1), can be queried. A simple approach to 
quantifying the authority is to repeat the ranking many times in the context 
of bootstrap resamples X*,...,X*, which replace the respective datasets 
Xi, . . . , X p . In particular, for each sequence X^ , . . . , X* , we can compute 

the respective versions 91, ... ,9* of the estimators of 0j , and calculate the 
bootstrap version of (2.2): 



(2.3) 



r* = i + Y,I(9* k >9*) for j = l,...,p. 

k+3 
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The bootstrap here can be of conventional n-out-of-n type, either paramet- 
ric or nonparametric, or it can be the m-out-of-n bootstrap (again either 
parametric or nonparametric), where the resamples Xj are of smaller size 
than the respective samples Xj. For definiteness, in Section 4, where we need 
to refer explicitly to the implementation of bootstrap methods, we shall use 
the nonparametric bootstrap. However, our conclusions also apply to para- 
metric bootstrap methods. More generally, the way in which the bootstrap 
resamples Xj are constructed can depend on the nature of the data. See 
Section 4.1 for discussion. 

One question in which we are obviously interested is whether the boot- 
strap captures the distribution of fj reasonably well, for example, whether 

(2.4) P(fj<r\X)-P(fj<r)^0, 

in probability for each integer r, as n — > oo. The answer to this question, if 
we use the familiar n-out-of-n bootstrap, is generally "only in cases where 
the limiting distribution of fj is degenerate." However, the answer is more 
positive if we employ the m-out-of-n bootstrap. There, if the populations 
IIi, • • • , n p are kept fixed in an asymptotic study then 

the limiting distribution of fj is supported on the set of inte- 

(2.5) gers {ki + 1, k\ + 2, ...,£2}, where k\ = J2kH@k > Oj) and &2 = 

Eki(Ok>0j), 

and the m-out-of-n bootstrap consistently estimates this distribution. In 
particular, (2.4) holds; see Section 3 for details. However (still in the case of 
fixed p), if we are more ambitious and permit the population distributions 
to vary with n in such a way that the limiting distribution is more complex 
than that prescribed by (2.5), then even the m-out-of-n bootstrap may fail 
to give consistency. 

Having computed a bootstrap approximation P{f* <r\X) to the proba- 
bility P{fj < r), we can calculate an empirical approximation to a prediction 
interval, specifically an interval [tt,^] within which fj lies with given prob- 
ability, for example, 0.95. Goldstein and Spiegelhalter (1996) refer to such 
intervals as "overlap intervals," since they are generally displayed in a fig- 
ure which shows the extent to which they overlap. Particularly, when p is 
relatively small, the discrete nature of the distribution of fj makes it a little 
awkward to discuss the accuracy of bootstrap prediction intervals, and so we 
focus instead on measures of the accuracy of distributional approximations, 
for example, (2.4) and (2.5). 
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3. The case of p distinct populations. 

3.1. Preliminary discussion. Write rij for the size of the sample Xj. The 
values of rij may differ, but we shall assume that they all of the same order. 
That is, writing n = p~ l Ylj n j f° r the average sample size, we have: 

(3.1) n" 1 sup rij = 0(1), l = 0(n~ 1 inf rij 

i<j< P v i<i<p 

When interpreting (3.1) it is convenient to think of n as the "asymptotic 
parameter," for example, the quantity which we take to diverge to infinity, 
and to consider m, . . . , n p as functions of n. 

When using the m-out-of-n bootstrap, where a resample of size rrij < rij 
is drawn either from the population distribution with estimated parameters 
(the parametric case) or by with-replacement resampling from the sample Xj 
(the case of the nonparametric bootstrap), and Xj is of size rij, we assume 
that the average resample size, m = p~ 1 J2j m j: satisfies the analogue of 

(3.1) : 

(3.2) m~ l sup rrij = O(l), l = 0(m~ 1 inf m, 

i<j<p v i<i<p 

Furthermore, we ask that m be large but m/n be small. 

In the cases of both fixed and divergent p, the properties of fj and r| are 
strongly influenced by the potential presence of tied values of Oj. However, 
it is perhaps unreasonable to assume, in practice, that two values of 9j are 
exactly tied, although there might be cases where two values are so close 
that, for most practical purposes, the properties of fj for small to moderate 
n are similar to those that would occur if the values were tied. The borderline 
case is that where two values of Oj differ by only a constant multiple of n -1 / 2 , 
with n denoting average sample size. (This requires the distribution of the 
populations IT,- to vary with n.) If the constant is sufficiently large, then 
practically speaking, the two values of Oj are not tied, but if the constant is 
small then a tie might appear to be present. 

To reflect this viewpoint, we shall for any particular j and for all k ^ j, 
write 

(3.3) O k = Oj + rC 1 / 2 Uj k , 

where the to^'s are permitted to depend on n. Of course, (3.3) amounts to 
a definition of u>jk, and if the quantities O k , for 1 < k < p, are all fixed then 
(3.3) implies that ujj k either vanishes or diverges to either +oo or — oo, in 
the latter two cases in proportion to ra 1 / 2 . However, since we shall permit 
the distributions of the populations n^, and hence also the O k s, to depend 
on n, then the problem can be set up in such a way that the ujj k s have 
many different modes of behavior. 
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In the case of the m-out-of-re bootstrap, where m — > oo but m/n — > 0, 
sensitivity is somewhat reduced by using a smaller resample size. Reflecting 
this restriction, in the m-out-of-n bootstrap setting, we use the following 
formula to define quantities u'j k , in place of the ujjk's at (3.3): 

(3.4) e k = e j +mr^ 2 u , jk . 

It can be proved that, under regularity conditions, the sum over r of the 
squared distance between the m-out-of-n bootstrap approximation to the 
distribution function of fj , and the limiting form Gj of that distribution [see 
(3.9) below], equals C\m~ l + C2mn~ 1 + o(m _1 + mn" 1 ), where C\ and C2 
are positive constants. This result implies that the asymptotically optimal 
choice of m equals the integer part of (Cin/C^) . However, this limit- 
theoretic argument is not always valid when p is large, and even in the case 
of small p it is not straightforward to estimate the ratio C\jCi- In Section 
3.4, we suggest an alternative, relatively flexible, method for choosing m. 

In most cases, where there are p distinct populations, it is reasonable 
to argue that the datasets Xi,...,X p are independent. For example, Xj 
might represent a sample relating to the performance of the jth of p health 
providers that are being operated essentially independently [see, e.g., Gold- 
stein and Spiegelhalter (1996)], and the data in Xj would be gathered in a 
way that is largely independent of data for other health providers. To the ex- 
tent to which the data are related, for example, through the common effects 
of government policies, or shared health-care challenges such as epidemics, 
we might interpret our analysis as conditional on those effects. 

If the assumption of independence is valid, then it is straightforward to 
reflect the assumption during the resampling operation, obtaining bootstrap 
parameter estimators 9*,..., 8* that are independent conditional on X = 
[JjXj. If the independence assumption is not appropriate, then resampling 
is generally a more complex operation, and may be so challenging as to be 
impractical. In the remainder of this section, we shall assume that X\ , . . . , Xp 
are independent, and that 81, ... ,8* are independent conditional on X. 

Sections 3.2 and 3.5 will outline theoretical properties in the case of fixed 
p and increasingly large p, respectively. To simplify and abbreviate our dis- 
cussion we shall state our main results only for one j at a time, but joint 
distribution properties can also be derived, analogous to those in Theorem 
4.1. 

3.2. Theoretical properties in the case of fixed p. To set the scene for our 
results, we note first that, under mild regularity conditions, it holds true 
that for fixed p, for each 1 < j < p and for each real number x, 

P{n 1 ' 2 {9 j -9 j )<a j x}^${x), 

(3.5) 

P{m 1/2 (9* - 9j) < o- jX \X} -» 
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where the asymptotic standard deviations o~j € (0, oo) do not depend on n, 
$ denotes the standard normal distribution function, and the convergence 
in the second part of (3.5) is in probability. In that second part, the value 
of m equals n if we are using the conventional bootstrap, and equals m if 
we are using the m-out-of-n bootstrap. 

The first formula in (3.5) is the conventional statement that the statis- 
tics 9j are asymptotically normally distributed, and the second is the stan- 
dard bootstrap form of that assumption. It asserts only that the bootstrap 
estimator of the distribution of n l l 2 {6j - 6j) is consistent for the normal 
distribution with zero mean and variance <t|. 

In this section, we keep p fixed as we vary n, although we permit the 
distributions of the populations IIi, . . . ,ILj to depend on n. Let Ni,...,N p 
denote independent standard normal random variables and, given constants 
c\,...,Cp, let Fj(-\ci, . . . ,c p ) denote the distribution function of the random 
variables 

1+ J2 IfaNjKakNk + Ck). 

k : k^j 

The value of Cj has no influence on Fj, but it is cumbersome to reflect this 
in notation. 

Theorem 3.1. Assume thatp is fixed and the datasets Xi,...,X p are in- 
dependent, that 61,..., 6* are independent conditional on X, and that (3.1), 
(3.2) (if using the m-out-of-n bootstrap) and (3.5) hold. [In (3.5), we take 
m = n unless using the m-out-of-n bootstrap.] 

(i) For each integer r, 

(3.6) P(fj < r) - Fj(r\ujji, . . .,io jp ) ->■ 
as n — > oo . 

(ii) Using the standard n-out-of-n bootstrap, either parametric or non- 
parametric, define the ujjk 's by (3.3). Then there exists a sequence of random 
variables Z±, . . . ,Z P , depending on n and being, for each choice of n, inde- 
pendent and having the standard normal distribution, such that 

(3.7) P(f* < r\X) - Fj(r\ujji + o\Z\ - VjZj, Uj p + a p Z p - ajZj) —> 

in probability as n — > oo. 

(hi) In the case of the m-out-of-n bootstrap, again either parametric or 
nonparametric, and for which m/n — > and m — > oo, define the 's by 
(3.4). Then (3.7) alters to 

(3.8) P(r* < r\X) - F^'ji, ■ ■ ■ ,">' jp ) - 
in probability as n — > oo. 
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3.3. Interpretation of Theorem 3.1. To illustrate the implications of the 
theorem, let us assume that u>j k , defined by (3.3), has (for each j and k) a 
well-defined limit (either finite or infinite) as n — ► oo, and that ojj k — ► +oo 
for k G K.+ , u>jk — > — oo for k G /C_, and ujj k has a finite limit, u/- fc say, for 
k G Kj = {1, . . . ,p} \ ({j} U K. + U /C_). (Both /C + and /C_ may depend on j.) 
Define Gj to be the distribution function of 

1 + (#£+) + J2 T ( a i N j ^ a kN k + <j%). 

Then Fj(r\u>ji, . . . ,uij P ) — > Gj(r), and so (3.6) implies that, as m oo, 

(3.9) Pft^rJ-Gj-Cr) 
for each integer r. 

Analogously to the argument leading from (3.6) to (3.9), result (3.7) im- 
plies that, in the case of the n-out-of-n bootstrap, 

P{fj < r \X) converges in distribution to the random variable, 

(3.10) p[l + (#/C + )+ I{o' 3 {N ] +N' J )<o- k (N k + N' k ) + u>%} 

<r\N u ...,N p 

where N\,. .. , N p , N[ , . . . , N' p are independent standard normal random vari- 
ables. However, the convergence of P{f* < r\X) is not in probability. 

If /C + U /C_ = {1, . . . ,p} \ {j}, which occurs for example, if the O^s are 
fixed and there are no ties for the value of 9j, then it follows from (3.6) 
and (3.7) that P{rj = rj) — > 1 and P(f* = rj\X) — > 1 in probability, where 
rj denotes the rank of 6j in the set of all #fc's. Therefore, in this degenerate 
setting, the standard n-out-of-n bootstrap correctly captures the asymptotic 
distribution of rj. 

In all other cases, however, the limiting distribution of fj [see (3.9)] does 
not equal the limit of the n-out-of-n bootstrap distribution of fj [see (3.10)]. 
Nevertheless, it is clear from (3.9) and (3.10) that 

The support of the limiting distribution of fj, and the support of 

(3.11) the weak limit of the distribution of given X, are identical, and 
both are equal to the set {#/C + + 1, . . . , #/C + + #/C? + 1}. 

To this extent the standard n-out-of-n bootstrap correctly captures impor- 
tant aspects of the distribution of fj . 

Superficially, (3.8) seems to imply that the ra-out-of-n bootstrap over- 
comes this problem. However, the w^'s are now defined by (3.4), and are 
different from the ujj k s at (3.3). As a result, the m-out-of-n bootstrap does 
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not, in general, correctly capture the limiting distribution at (3.9). Never- 
theless, if 

for each k ^ j either m 1/2 (0 k -0j) ^±00 or 

(3.12) 

n 1/2 (0 fc -0j)->O, 

then P{f* < r\X) - P(fj < r) — > in probability, that is, (2.4) holds. In 
particular, the m-out-of-n bootstrap consistently estimates the distribution 
of empirical ranks. Under condition (3.12), the following analogue of (3.11) 
holds for the m-out-of-n bootstrap: 

The limiting distributions of fj, and of f| conditional on X, are 

(3.13) identical when using the m-out-of-n bootstrap, and the support of 
each equals the set {#/C+ + 1, . . . , #/C+ + #JCj + 1}. 

Property (3.12) holds if the 6 k s are all fixed (i.e., do not depend on n). 
Therefore, the m-out-of-n bootstrap correctly estimates the distribution of 
ranks in the presence of ties, when the populations are kept fixed as sample 
sizes diverge, and also in other cases where the differences 9 k — 0j are of 
either strictly larger order than m -1 / 2 or strictly smaller order than n -1 / 2 . 
When (3.12) holds, the asymptotic distribution of fj is supported on a set 
the size of jfclCj , that is the number of integers k for which m 1 / 2 (6» fc - Oj) -> 0. 

3.4. Methods for choosing m. Consider a comparison of two of the pop- 
ulations Uj and Ilfc, and focus on the probability of ranking one higher 
than the other using the m-out-of-n bootstrap. Assuming (3.5) and letting 
c = (<t 2 + cr 2 .) -1 / 2 , we see that 

P(f* <f* k \X)-P(f 3 <f k ) 

= p{e*>e* k \x)-p(e j >e k ) 

« ^{m 1 / 2 ^ _ e k )} - <&K /2 c(% - o k )} 

w 9{m^ 2 c(9j - 9 k ) + c{m/nfl 2 Z} - ^{n 1 / 2 ^ - 6 k )} 
= <5>{{m/n) l / 2 {-av jk + Z)} - $(-aj jh ). 

Here, Z denotes a realization of a normal random variable, and $ is the 
standard normal distribution function. Thus, choosing m to minimize the 
squared difference between the bootstrapped and true probabilities is ap- 
proximately equivalent to choosing m to minimize the expression 

(3.14) [<5>{(m/n) l l 2 (-cu> jk + Z)} - <f>(-cw jk )] 2 . 

If ujj k — > ±00, then the expression is minimized as long as (m/n) l / 2 ujj k — > 
±00 too, which guarantees that m — > 00 as long as ujj k is no larger than 
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0(ra 1//2 ). Alternatively, if ujjk — > 0, then (3.14) is minimized provided m/n — > 
0. This discussion motivates an approach for choosing m by tuning the boot- 
strapped probabilities to match the true probabilities. In reality, however, we 
do not know ujjk, c or Z so these must be estimated using ujjk = n 1//2 (0fc — 8j), 
c = (<r 2 + (?f,)~ l l 2 and a random normal variable, respectively. The situation 
is simplified if we have a "gap" between the orders of the diverging Ujk and 
those converging, such as the following: 

(3.15) For each pair j,k either ujjk -^-0 or IwjfcKlogn) -1 / 2 — ► oo. 
Thus, we estimate m by choosing it to minimize the expression 
£ J ($[(m/n) 1 /2 { _^ jfe (log ri )-i/2 + ^ }] 

(3.16) 

The following theorem, a proof of which is given in the Ph.D. thesis of the 
second author, shows that choosing m in this fashion is consistent. 

Theorem 3.2. Assume p is fixed and that (3.1), (3.2), (3.5) and (3.15) 
hold. Choose m by minimizing (3.16). Then we have for each j 

P(f* < r\X) — P(fj < r) -> 

in probability. 



While this result suggests a way of determining m, there remains some 
uncertainty since the (logn) _1//2 factor used is not unique in generating 
good asymptotic performance. For example, replacing it with (log Cn)" 1 ^ 2 
for some constant C would yield a similar theoretic result. In practice, the 
dataset under consideration often suggests whether the adopted factor is 
appropriate, and the choice of m is reasonably robust against such changes. 

3.5. Theoretical properties in the case of large p. The results above can 
be generalized to cases where p diverges with n but the support of the 
limiting distribution of fj remains bounded. The defining features of those 
extensions are that values of \0k — 0j\, for indices k that are not in the fCj of 
the previous section, should be at least as large as (n~ 1 logra) 1 / 2 ; and values 
of \9k — for k in JCj, should be at least as small as ra^ 1 / 2 . We shall give 
results of this type in Section 4.2. In the present section, we show how to 
capture, in a theoretical model, instances where both p and the support of 
the distribution of fj are large. Real-data examples of this type are given 
by Goldstein and Spiegelhalter (1996). 
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Specifically, we assume the following linear model for Qy. 



(3.17) 



■ a 



and e = e(n) > 



ej for 1 < j < p, where a = a(n,p) does not depend on j 



This condition ensures the simple numerical ordering Q\ > • • • > 6 P , which 
in more general contexts we can impose without loss of generality. Assump- 
tion (3.17) also allows us to adjust the difficulty of the empirical ranking 
problem by altering the size of e; the difficulty increases as s decreases. 

As in Theorem 3.1, we assume that the datasets Xj are independent, 
but now we permit p = p(n) to diverge with n. In order that Theorem 
3.3 below may be stated relatively simply, we assume that the quantities 
Z k = n 1//2 (^fe — 6 k ) all have the same asymptotic variance a. Our main con- 
clusion, that the standard n-out-of-n bootstrap correctly captures order of 
magnitude but not constant multipliers, remains valid as long as the limiting 
variances of the Z k s are bounded away from zero and infinity. 

We also assume conditions (3.18) and (3.19) below. In cases where each 
6j is a quantity such as a mean, a quantile, or any one of many different 
robust measures of location, those conditions follow from moderate-deviation 
properties of sums of independent random variables, provided the data have 
sufficiently many finite moments and p does not diverge too rapidly as a 
function of n: 



(3.18) 



^{™ 1/2 (4 - Ok) < <TX) = $(x){l + o(l)} + 0(p- 1 ?i- 1 /2 £ -l) 5 uni . 

formly in \x\ = 0(pn 1 ^ 2 e) and in 1 < k < p, as n — > oo, where a > 0; 

P{n l / 2 (6* k -§ k ) <ax\X} = <S>(x){l + o p (l)} + o p (p- 1 n- l / 2 e- 1 ), uni- 
(3.19) formly in |x| = 0(pn 1 ^ 2 e) and in 1 < k < p, as n — > oo, where a is as 
in (3.18). 

In order for (3.18) and (3.19) to hold as p increases, the value of e should 
decrease as a function of p, that is, the empirical ranking problem should be 
made more difficult for larger values of p. Define 5 = {n/2) l / 2 e/a, where a > 
is as in (3.18) and (3.19), and put u jk = n x l 2 {d k -6 k - (0j - d j )}/(2 1 / 2 a). 



Theorem 3.3. Assume that the datasets Xi,...,X p are independent, 

that 01,... ,9* are independent conditional on X, that (3.17)-(3.19) hold, 
and that p = p(n) — > oo and e = e(n) j. as n increases, in such a manner 

that n 1 ' 2 e { and pn x l 2 e — > oo. Then 

/oo 
^(-x)dx + o(5- 1 ), 

(3.21) E(f*\X) = {l + o p (l)} ]T ^fc + ^j-AOI + OpOT 1 ), 
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uniformly in 1 < j < C /{n l l 2 e) for any C > 0. 

The implications of Theorem 3.3 can be seen most simply when j is fixed, 
although other cases are similar. For any fixed j, it follows from (3.20) and 

(3.21) that 

(3.22) E(f j )~CS- 1 , 

d${z) / $(Wj - x - Z2- 1 / 2 ) dx, 

-oo JO 

where C = f x>0 x) dx, Wj = — (n/2) 1//2 (#j — 0j)/a, a n ~ b n for constants 
a n and b n means that a n /b n — » 1, and A n ~ p i? n for random variables A n 
and -B n means that A n /B n — > 1 in probability. Results (3.22) and (3.23) 
reflect the highly variable character of fj in the difficult cases represented 
by the model (3.17). For example, if r,- = j, which of course is fixed if j 
is fixed, then both E(fj) and E(fj\X) are of size 5" 1 , which diverges to 
infinity as n — > oo. That is, despite rj being fixed, fj tend to be so large 
that its expected value diverges. Similar arguments show that vax(fj\Xj) 
and vai(fj\X,Xj) are both of size J" 1 . 

It is clear from (3.22) and (3.23) that the standard n-out-of-n bootstrap 
correctly captures the order of magnitude, of E(fj), but does not get the 
constant multiplier right. Similar arguments, based on elementary properties 
of sums of independent random variables, show that the standard bootstrap 
produces a prediction interval for rj for which the length has the correct 
order of magnitude, but again the constant multiplier is not correct. The m- 
out-of-n bootstrap gets both the order of magnitude and the constant right, 
but at the expense of more restrictive conditions on e; one could predict from 
Theorem 3.1 that this would be the case. It is also possible to establish a 
central limit theorem describing properties of E(fj) and E{r*\X). However, 
since the limitations of the bootstrap are clear at a coarser level than that 
type of analysis would address, then we shall not give those results here. 

3.6. Numerical properties. We present numerical work which reinforces 
and complements the theoretical issues discussed above. In our first set of 
simulations, we observe n independent data vectors (Xi, . . . , Xiq), where the 
Xj's are independent and are, respectively, distributed as normal N(6j,l). 
First, we consider the case where 0j = 1 — (j/10), implying that the means 
are evenly spaced and do not depend on n. Although this model appears 
straightforward, the gaps between means are one tenth of the value of noise 
standard deviation, and so significant ranking challenges are present. How- 
ever, Figure 1 shows that this is a case that the standard n-out-of-n boot- 
strap can handle satisfactorily, with the 90% prediction intervals for the 
estimated ranks shrinking as n grows. 
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Nevertheless, our theory suggests that the n-out-of-n bootstrap will fail 
to correctly estimate the distribution in cases where the values of 6j are 
relatively close. To investigate this issue, we took 9j = 1 for j E {1, 2, 3, 4, 5}, 
and 9j = 0, otherwise. Then in our bootstrap replicates, we would expect 
fj, conditional on the data, to be approximately uniformly distributed on 
either the top five positions (in the case j < 5) or the bottom five (when 
j > 6). Figure 2 shows the difference in distributions for a simulation with 
n = 1000 and two choices of m. For each variable, the shading intensities 
in that column show the relative empirical distributions across ranks. Here 
the m-out-of-ra bootstrap, with m = 300, produces distributions closer to 
the truth, where each of the top-left and bottom-right regions would have 
exactly equal intensities everywhere. 



n = 100 n = 200 




Xl X2 X3 X4 X5 Xfi X 7 Xg X9 X10 X1 X2 X3 X4 X5 X6 X 7 Xg Xg X10 



Fig. 1. Ranking 90% predictio 



<n intervals for the case of fixed 9j . 
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* 5 -g 5 

It 6 




10 10 

Xi X2 X3 X 4 X 5 Xq X7 Xg X9 X-|0 X1 X 2 X3 X 4 X 5 X6 X 7 Xg X9 X-io 

Variable Variable 

Fig. 2. Distribution of ranks in the presence of ties. 

The case of perfect ties demonstrates the advantages of the m-out-of-n 
bootstrap. In more subtle settings, when the O^s vary with n and are not 
exactly tied, we are interested in the ability of the bootstrap to distinguish 
Oj's for which the absolute differences in \0j — 6k\ are relatively small. The 
theory suggests considering differences of size m~ a , where a = | is the crit- 
ical value, lower values of a tend toward a (degenerate) perfect separation 
of ranks, and higher values asymptotically behave as though 8j and 9^ were 
tied. Therefore, the next set of simulations had the O^s equally spaced and 
uniformly decreasing, with 6j — 0j+i equal to 0.2(10/m) Q . Here m was taken 
to be min(10n 1//2 , n). Figure 3 shows, for a given pair (a, n), the average num- 
ber of ranks contained within the 90% rank prediction interval. The results 
accord with the theory; cases where a < 0.5 tend toward perfect separation 
(an average of 1), and cases where a > 0.5 tend toward completely random 
ordering (an average of 10). Situations where a is closer to 0.5 diverge more 
slowly, and the behavior when a = 0.5 depends on the exact situation; in 
our simulations the degree of tuning has ensured that the case where a = 0.5 
does not show much tendency toward either extreme. 

It is important to understand the distributional bias seen in the n-out-of-n 
bootstrap. One way this can be done is by exploring the distribution implied 
by (3.7). The distribution is dependent on the realization of normal standard 
random variables Z\, . . . , Z p . Figure 4 shows how the distribution of rankings 
varies with Z\ for the special case of five variables, with cji = • • • = W5 = and 
Z<i = Z3 = Z4 = Z§ = 0. Here, as \Z±\ departs from 0, the ranking distribution 
is upset in two key ways. First, the average ranking is biased; for example, 
when Z\ = 1 the average observed rank is 1.95 instead of 3, the average 
observed rank in the true underlying distribution obtained when Z\ = 0. 
Second, the variation of the observed rank is reduced; the variance is 1.4 
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Fig. 3. Behavior of prediction interval widths for various a. 



when \Z\ \ = 1 compared with 2 in the true distribution. These two effects 
combine to give overconfidence in the n-out-of-n bootstrap when it is not 
warranted. 




-1 -0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 0.8 1 

Zi 

Fig. 4. Distribution of ranks for various Z\. 
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We now move to a real-data example. A service seeking to assist par- 
ents to choose secondary schools in the state of NSW, Australia, ranks 75 
schools using the number of credits achieved in final year Higher School 
Certificate exams as a percentage of the number of exams sat. While there 
are clearly significant problems with such a simple statistic [see Goldstein 
and Spiegelhalter (1996)], the main one being that it ignores prior student 
ability, it would still be useful to give some indication of the variability of 
the rankings. Here n,- represents the number of exams sat at school j, and 
Xij , for 1 < i < rij , is an indicator variable for whether a credit was achieved 
m exam i. Then 6j = E{Xij) and §j = nj 1 J2i -Xy. Fi gure 5 shows 95% pre- 
diction intervals for the ranks using the n-out-of-n bootstrap. It is clear that 
caution needs to be exercised when interpreting the intervals, the average 
width of which exceeds 14 places. However, we know that the n-out-of-n 
bootstrap ranking understates the true uncertainty, which would be better 
captured using the m-out-of-n bootstrap. Figure 6 shows the results using 
rrij = [rij x 35.5%J . The percentage here was chosen using the approach dis- 
cussed in Section 3.4, attempting to minimize the squared error between 
the bootstrap and real ranking distributions. Observe that the widths of 
the prediction intervals are now markedly longer (58% longer on average); 
the widest confidence interval now covers 81% of the possible rankings. Our 
theoretical results argue that these longer widths give a better indication of 
the true uncertainty associated with the ranking. 
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FlG. 5. School ranking prediction intervals for n-out-of-n bootstrap. 
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FlG. 6. School ranking prediction intervals for m-out-of-n bootstrap with mj equal to 
35.5% ofrij. 



4. Properties in cases where the data come as independent p-vectors. 

4.1. Motivation for the independent- component bootstrap. In this sec- 
tion, we argue that when vector components are not strongly dependent 
the standard, "synchronous" bootstrap may distort relationships among 
components, particularly in the setting of conditional inference and when 
p is large. In such cases, even if the assumption of independent compo- 
nents is not strictly correct, it may be advantageous to apply the bootstrap 
as though independence prevailed. We refer to this working assumption as 
that of "component- wise independence." 

We treat the case where the data arise via a sample Xq = {X±, . . . ,X n } 
of independent p vectors. Here Xi = (Xn, . . . , Xi p ), and Xj = {X\j, . . . , X n j} 
denotes the set of jth components. The conventional, synchronous form of 
the nonparametric bootstrap involves the following resampling algorithm: 

Draw a resample Xq = {X(, . . . , X^} by sampling randomly, with 
(4.1) replacement, from X , write X* = (X* x , . . . , X* p ) and take X* = 
{X^j , . . . , X m j } . 

We can view XJ as the resample drawn from the jth. "population." In (4.1), 
we take m<n, thereby allowing for the m-out-of-n bootstrap. 
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We argue that this bootstrap method is not always satisfactory in prob- 
lems where ranking is involved. One reason is that: 

If the data have a continuous distribution, then knowing the dataset 
Xj conveys perfect information about which data vectors X{ are 
,^ £\ included in Xq, defined in (4.1), and with what frequencies. Hence, 
knowing XJ tells us X k for each k, and in particular the resamples 
. . . ,X* cannot be regarded as independent, conditional on Xq, 
even if the vector components are independent. 

This result holds for the m-out-of-n bootstrap as well as for the standard, 
synchronous bootstrap, and so the problems to which it leads cannot be 
alleviated simply by passing to a smaller resample size. 

To elucidate the consequences of (4.2), note that the jth empirical rank 
fj, and its bootstrap version f|, can be written as 

(4.3) r, = l + m<h), r* = l+ £ 1$ <%), 

k : k^j k : k^j 

respectively. Here, fj and r* are as at (2.2) and (2.3). We wish to estimate 
aspects of the distribution of fj. For example, we might seek an estimator 
of the variance of the conditional mean, Uj = E(fj\Xj) = 1 + ^k-.k^j^jk? 
of fj given Xj; or we might wish to approximate the variance of fj. [To 
derive the formula for Uj, we used the first part of (4.3), and took TTjk = 
P(0j < 9k\ Xj).] Undertaking conditional inference is particularly attractive 
in problems where p is large, because it has the potential to greatly reduce 
variability, from 0(p 2 ) (the order of the unconditional variance of fj) to 
0(p) (the order of the variance of fj, conditional on Xj, if the components 
are sufficiently weakly dependent). 

The bootstrap version of iij can be computed using the second formula 
in (4.3): u* = E(f*\X,X*) = 1 + J2k:k^jTr* k , where X = \J k X k and TT* k = 

P(6j < 6 k \X , Xj). If we use the synchronous bootstrap algorithm at (4.1), 

then it follows from (4.2) that n* k = 1(9* < § k ). Since the probability has 
degenerated to an indicator function, then even when using the m-out-of- 
n bootstrap, and in the conventional setting of fixed p and increasing n, 
Nw[u*j\X) — var(uj) fails to converge to zero except in degenerate cases. 

The errors can become still more pronounced if p diverges with n. Indeed, 
in the problem of estimating 

var(uj)= ^ cov (ir j kl ,-K jk2 ) 

ki : kiy^j &2 : te^j 

using 

(4.4) var(n*[*)= ]T £ cov^,^*) 

fel : kxytj k 2 : k 2 ^j 
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and in the context of component-wise independence, the synchronous boot- 
strap at (4.1) introduces correlation terms of size n -1 / 2 , n~ l , . . . ; those terms 
would be zero if the bootstrap algorithm correctly reflected component- wise 
independence. If p is much larger than n, then the impact of the extrane- 
ous terms is magnified by the summations over k\ and k<i in (4.4). These 
problems, too, persist when employing the m-out-of-n bootstrap. 

The situation improves significantly if, instead of using the synchronous 
bootstrap at (4.1), we employ the following independent-component resam- 
pling algorithm: 

Compute X* = {X*j , . . . , X^A by sampling randomly, with replace- 

(4.5) ment, from Xj = {X\j , . . . , X n j}; and do this independently for each 
3- 

In this case, when using the m-out-of-n bootstrap and working under the 
assumption of component-wise independence, vax{u*j\X) — vax{uj) converges 
to zero as n diverges, and the bothersome n~ 1 / 2 terms that arise when 
estimating var(uj), using the synchronous bootstrap, vanish. To summarize, 
under component- wise independence the independent-component bootstrap, 
defined at (4.5), corrects for significant errors that can be committed by the 
synchronous bootstrap algorithm at (4.1). 

Importantly, similar conclusions are also reached in cases where p is large 
and the component vectors (Xy, . . . , X n j) are not independent. In partic- 
ular, if the dependence among components is sufficiently weak to ensure 
that the asymptotic distribution of fj is identical to what it would be if 
the components were independent, then the independent-component boot- 
strap has obvious attractions. For example, in inferential problems involv- 
ing conditioning on Xj, it gives statistical consistency in contexts where the 
synchronous bootstrap does not. This can happen even under conditions 
of reasonably strong dependence, simply because the highly ranked compo- 
nents are lagged well apart. Details will be outlined in the first paragraph 
of Section 4.3. 

4.2. Theoretical properties. We address only the jo highest-ranked pop- 
ulations, which for notational convenience we take to be those with indices 
j = 1, . . . , jo, and we take the ranks of these populations to be virtually tied, 
so that the limiting distribution of fj is nondegenerate. Also, we allow both p 
and the distributions of ITi , . . . , U p to depend on n. In particular, we assume 
that: 

(4.6) n 1 / 2 (9 1 -e j )^0 for j = l,...,j , 

(4.7) p = o(n Cl ) for some C\ > 0. 
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To determine the limiting distribution of fj, we further suppose that 

(4.8) (n/ log n) 1/2 inf (0i - Oj) -► oo 

io<i<p 

and 

the random variables n 1 ^ 2 (9j — #j), for 1 < j < jo, are asymp- 
totically independent and normally distributed with zero means 
^ ' ' and respective variances a 2 ; and, for C2 > sufficiently large, 

sn Pj< P p i\Qj ~ 0j\ > O^n- 1 log n) 1 / 2 } = 0(n" Cl ). 

When discussing the efficacy of the m-out-of-n bootstrap we ask, instead of 
(4.6), (4.8) and (4.9), that 

(4.10) m 1 / 2 (6»i - 0j) -> for j = 1, 

(4.11) (m/logm) 1/2 inf (9 1 -9, 

h<3<P 



,3o, 



00, 



(4.12) 



conditional on X , the random variables m 1 ^ 2 (9j — 9j), for 1 < j < jo, 
are asymptotically independent and normally distributed with zero 
means and respective variances a 2 ; and for C2 > sufficiently large, 
su Pj < pJ P{|0* - %| > ^(m-Mogm) 1 ^} = 0(n- Cl ). 



For example, the last parts of (4.9) and (4.12) hold if 9j and 9j are, respec- 
tively, population and sample means, if the associated population variances 
are bounded away from zero, and if the supremum over j of absolute mo- 
ments of order C3, for the population IT,-, is bounded for a sufficiently large 
C3 > 0. See Rubin and Sethuraman (1965) and Amosova (1972). Likewise, 
(4.9) and (4.12) also apply in cases where each 9j is a quantile or any one of 
many different robust measures of location. The first part of (4.9) is a stan- 
dard central limit theorem for the estimators 9j, and so is a weak assumption. 
In (4.12), we do not specify using the independent-component bootstrap [see 
(4.5)], but if we do impose that condition then the first part of (4.12) is a 
conventional central limit theorem for the m-out-of-n bootstrap, and in that 
setting we do not need to assume independence of the asymptotic normal 
distribution of the variables m 1//2 (0j — 9j); it follows from the nature of the 
independent-component bootstrap. 

Theorem 4.1. Let 1 < j < j . 

(i) If (4-6)-(4-9) hold then the ranks f\, . . . , fj are asymptotically jointly 
distributed as R\,. . . , Rj , where 

(4.13) Rj = l+ I{ZjCTj<Z k a k ) 

k:k<j ,k^j 

and Z\, . . . , Zj Q are independent and normal N(0, 1). 
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(ii) Assume (4-7) and (4-9)-(4-12), and use the m-out-of-n bootstrap 
(where m/n — > and m — > oo as n — > oo), in either the conventional form at 
(4-2) or the component-wise from at (4-5). Then the distribution of (r*, . . . ,r| ), 
conditional on the data X , converges in probability to the distribution of 
(Ri ,...,Rj ). 

(iii) Assume (4-7) and (4-9)-(4-12), use the m-out-of-n bootstrap with 
m/n — > and m — > oo as n — > oo, and implement the bootstrap component- 
wise, as in (4-5). Then the distribution ofu*, conditional on X , is consistent 
for that of Uj . That is, 

(4.14) P{E(f*\X, X*) < x\X} -» P{E{RAZj) < x} 

in probability, for all continuity points x of the cumulative distribution func- 
tion P{E(Rj\Zj) < x}. Moreover, va.i(fj\X) — > vai(Rj). 

4.3. Discussion. The assumptions underpinning Theorem 4.1 do not re- 
quire the components of the data vectors Xi = (Xn, . . . , Xi p ) to be indepen- 
dent, but they do ask that the empirical ranks Oj, corresponding to the true 
Oj's that are virtually tied for the top jo positions, be asymptotically inde- 
pendent. See the first part of (4.9). That condition holds in many problems 
where p is diverging but the components are strongly dependent, for exam- 
ple, when 8j is a mean and the common distribution of the vectors Xi is de- 
termined by adding 9j's randomly to centred, although potentially strongly 
dependent, noise. For example, if the components of the noise process are 
^-dependent, where the integer q is permitted to diverge with increasing 
n and p, then in the case of fixed jo explored in Theorem 4.1, sufficient 
independence is ensured by the condition that q/p — > as p — > oo. 

Parts (i) and (ii) of Theorem 4.1 together imply that (3.13) continues to 
hold in the present setting, provided j is in the range 1 < J < Jo- 

As noted in Section 4.1, the result in the first part of Theorem 4.1 (iii) 
does not hold if the synchronous bootstrap is used. Likewise, while the 
independent-component, m-out-of-n bootstrap can be proved to consistently 
estimate the distribution of vax(fj\Xj), neither the n-out-of-n bootstrap nor 
its m-out-of-n bootstrap form give consistency if applied using the conven- 
tional resampling algorithm at (4.1). The same challenges arise for a variety 
of other estimation problems; the problems treated in Theorem 4.1 (iii) are 
merely examples. 

In cases where p is very much larger than n, and the aim is to discover in- 
formation concealed in a very high-dimensional dataset, choosing m for the 
m-out-of-n bootstrap might best be regarded as selecting the level of sensi- 
tivity rather than as choosing the level of smoothing in a more conventional, 
m-out-of-n bootstrap sense. Since the desired level of sensitivity depends on 
the unknown populations LL;, and, in the most important marginal cases, is 
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unknown, then it may not always be appropriate to use a standard empirical 
approach to choosing m. Instead, numerical results for different values of m 
could be obtained. 

Results analogous to Theorem 3.3 can also be established in the present 
setting. In particular, in cases where fj is highly variable, the standard n- 
out-of-n bootstrap correctly captures the order of magnitude, but not the 
constant multiplier, of characteristics of the distribution of fj, for example, 
its expected value and the lengths of associated prediction intervals. 

4.4. Numerical properties. To gain insight into the advantages of the 
independent-component bootstrap we consider the following setting. Sup- 
pose we have p variables and n observations, and the jth variable Xj is 
modelled by Xj = 8j + Zj, where 0j is a constant, Zj is a standard ran- 
dom normal variable, cor (Zj, Zf.) = p n when j ^ k, and p n decreases to 
as n increases. We wish to compare performance of the standard and 
independent-component bootstraps in the task of ranking the values of 9j. 
As our performance measure, we use the squared error criterion: 

£5>{P(f;=r|*)-P(f J= r)} 2 . 

3 r 

Figure 7 gives results for n = 50, p = 200 and 9j = 1 — {j/(p — 1)}, for various 
choices of p n . It shows that the independent-component bootstrap consis- 
tently improves performance. Interestingly, performance of the independent- 
component case is at its best when a reasonable level of correlation present. 
This is apparently because, in the presence of correlation, the true ranking 
distribution becomes more "lumpy" or more degenerate. 

The Rol31 dataset was used by Segal et al. (2003) to compare a variety 
of genomic approaches. The dataset contains 30 independent observations, 
each with continuous expression levels Xi for 6,319 genes, as well as a con- 
tinuous response Y{ for the expression level of a G protein-coupled receptor 
called Rol31. Hall and Miller (2009) used generalized correlation between 
the observed Y and each set of gene expressions Xj to rank the genes, and 
then applied the standard, synchronous bootstrap to give indicative pre- 
diction intervals for these rankings. These results are presented in Figure 
8 for the top 15 variables. It should be observed that significant levels of 
correlation exist between pairs of influential genes. There are at least two 
possible reasons for this. First, if gene expression levels closely follow the 
movements of response variables then genes will share some of this correla- 
tion indirectly. Second, there may be intrinsic correlation between two genes 
if they are controlled by some common underlying process. 

If the first reason is suspected to be the dominant one, then the independent- 
component bootstrap should give a better indication of uncertainties in 
ranking. Figure 9 depicts results for the independent-component bootstrap. 
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Notice that prediction interval widths are greater than in the synchronous 
case. This is because the positive correlations among values of Oj in the 
synchronous case reduce the variations in rankings. 




Synchronous — Indep. component 



Fig. 7. Relative error of synchronous and independent-component bootstrap distributions. 
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Fig. 8. Synchronous bootstrap results for Rol31 dataset. 
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Fig. 9. Independent- component bootstrap results for Rol31 dataset. 



Another plot that is useful in understanding rankings is that of condi- 
tional rankings, the subject of Theorem 4.1. Figure 10 shows the rankings 
for the top genes, together with prediction intervals for r*, conditional on 
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Fig. 10. Independent reverse synchronous bootstrap results for Rol31 dataset. 
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both X and 9j. Thus, for a given gene we have held the observed generalised 
correlation for it constant and bootstrapped on all other genes, to estimate 
how the genes should be ranked given the value of Qj. The results for this 
analysis are highly dependent on whether the bootstrap is performed syn- 
chronously or independently. For reasons given in Sections 4.1-4.3, we prefer 
the independent-component bootstrap in this situation. Figure 10 displays 
the corresponding prediction intervals. Two features of the results are strik- 
ing. First, the prediction intervals are very narrow compared to those seen 
in Figures 8 and 9, highlighting that the fact that most of the uncertainty 
in ranking the jth gene comes from the uncertainty of 0j itself. Second, 
the prediction intervals lie below the actual point estimate for the rank. 
This suggests that if the experiment were performed again, we would be 
unlikely to see the top-ranked variables rank as highly as before. In fact, 
we would expect the top variable to rank outside the top twenty, even if it 
appeared as strongly as it did in our observed data. These two observations 
are interesting, and highlight the challenges of variable selection in such 
high-dimensional settings. 

We reiterate here one observation relevant to both the independent-component 
bootstrap and the discussion of the m-out-of-n bootstrap in Section 3. When 
constructing prediction intervals for ranks, the method that produces the 
shortest intervals is not necessarily the most powerful or the most accurate. 
Both the theoretical and numerical results suggest that the synchronous 
bootstrap will produce widths that are too narrow compared to the theo- 
retical ranking distribution; the bootstrap ranks become "anchored" to the 
observed empirical ranks. Thus, interpreting ranking sensitivities for real 
datasets involves attempting to balance both maximising the power of an 
approach with the risks of overstating ranking accuracy. In many cases, it 
will be simulation and experimentation that suggest the best balance in a 
given situation. 

The final example comprises of a set of simulations that illustrate the 
results of Theorem 4.1 in a high-dimensional setting. The aim here is to 
estimate the correct distribution for the top five ranked variables. For each 
of six scenarios, we start with the base case of n = 20, p = 500, which was 
constructed as follows. The mean is once more the statistic of interest. Each 
data point Xij is normal with standard deviation 0.25 and the jth mean is 
9j = 1 for j = 1, . . . ,5 and is randomly sampled from the uniform distribution 
over [0,0.9] when j > 5. Once the data is generated, we may derive the 
ranking distribution using the independent component bootstrap with m = 
20. We use the statistic 

5 5 

Error = ^ J2{P{r * < r\X) - P(r < r)} 2 , 
j=lr=\ 
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to measure how accurately the rankings for the first five variables are esti- 
mated. Notice that P(r < r) = r/5 for r = 1, . . . , 5 and that the error statistic 
is if and only if this distribution is matched exactly in the bootstrapped 
distribution. We repeat this experiment 100 times and report the average 
error along with 90% confidence intervals for this average. Prom here, the 
simulation grows by increasing n and increasing m at rate n/log(ra). In each 
scenario, p is constant or grows at a linear or quadratic rate relative to n. 
Also, the gap between the mean of the top five variables and the upper range 
of the uniform sampling distribution is either left constant or shrunk at a 
square rooted logarithmic rate. This results in six scenarios, the results of 
which are plotted in Figure 11. The error has been scaled so that 100 denotes 
maximum possible error. Observe that the quadratic growth simulations in 
particular achieve very high dimensions; when n = 140, p = 24,500, which is 
competitive with the dimensionality for many genomic applications. 

Theorem 4.1 establishes that under each of these scenarios the distribution 
of the top five variables should be estimated correctly, since p increases only 
polynomially and the gap is either constant or shrinks sufficiently slowly; 
compare with (4.7), (4.12). The results reinforce these findings, with error 
steadily decreasing in all cases except the quadratic ones. In these final cases, 
the error increases briefly until the stability of the means outweighed the 
effects of increasing p and decreasing gap. The error then steadily decreases, 
albeit at a much slower rate than the constant and linear scenarios. We can 
see that the problem is noticeably more difficult when the gap shrinks, as 
well as when p grows at a faster rate. 

This example was constructed to demonstrate that the theoretical results 
can hold while the data size remained computable. However, there are in- 
stances where very large n are needed before such distributional accuracy is 
obtained. For instance, if we tripled the standard deviation in final scenario, 
where we have quadratic growth in p and a shrinking gap, we would require 
n > 1000 before the error started to decrease and satisfactory results were 
obtained. In this case, p would be over one million, which is in excess of 
current desktop computer capability. 

5. Technical arguments. 

5.1. Proof of Theorem 3.1. (i) In view of the first part of (3.5), we may 
write 

(5.1) rj = l+ J2 I (0j<0k) = l+ J2 I(vjAj<o- k Ak + Ujk), 

k : k^j k : k^j 

where the random variables = ra 1//2 (#fc — 6k)/o~k are jointly independent 
and asymptotically standard normal. Result (3.6) can be proved from this 
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Fig. 11. Average error with 90% confidence intervals for p> n simulations. 



quantity by considering the respective cases where values in the sequence 
Uji-, for 1 < k < p, are finite or infinite. 

(ii) To derive (3.7), we note that, in view of the second part of (3.5), 

f* = l+ £ 1(6* <§* k ) 

k : k^j 

(5.2) =1+ E /(n" 1/2 ^A*<n- 1 /V,.A^ + 4-%) 

k:krt 
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= 1+ I{o-jAj + ajA* <a k A k + a k A* k + uj jk ), 

k : k^j 

where, conditional on X, the random variables A£ = n l / 2 {9 k — d k )/a k are 
jointly independent and asymptotically standard normal, and the A^'s are 
as in (5.1). Since, by the first part of (3.5), the A^'s are asymptotically 
independent and standard normal (in an unconditional sense), then by Kol- 
mogorov's extension theorem, we can (on a sufficiently large probability 
space) find random variables Zi,...,Z p which depend on n, are exactly in- 
dependent and exactly standard normal for each n, and have the property 
that Afc = Z k + o p {\) for each k, as n — > oo. Result (3.7) follows from these 
properties and (5.2). 

(iii) Result (5.2) continues to hold in the case of the m-out-of-n boot- 
strap, except that to obtain the arguments of the indicator functions there 
we have to multiply throughout by m 1 / 2 rather than ro 1 / 2 . This means that 
to interpret (5.2), we should redefine A k = m}l 2 (Q k — &k) /°~k an d A k = 
m V 2 (0* _ Q k y akm Since m/n — > 0, then, on the present occasion, A k — ► 
in probability for each k, but, in view of the second part of (3.5), the con- 
ditional distribution of A k continues to be asymptotically normal N(0,a k ). 
Result (3.8) now follows from (5.2). 

5.2. Proof of Theorem 3.3. Observe from (3.17), (3.18) and (5.1) that 
E(fj)-1= £ P(9j<e k )= P{^<^k + 2 l ' 2 5(j-k)} 

k : k^j k : kj^j 

= {1 + 0(1)} £ n5(j-k)} + o(5~ 1 ) 

k : k^j 

/oo 
^(-xWx + o^- 1 ), 
-is 

where A k = n x l 2 {Q k - 9 k )/a. This gives (3.20). Similarly, (3.21) follows from 

E{r*\x)-i= ]T P{e*<e* k \x) 

k:k^j 

= £ P{X$<Al + 2 1 ' 2 Sj jk + 2 1 ' 2 6(j-k)} 

k : k^j 

={i+o p (i)} H^k+su-m+opis- 1 ), 

k : k^j 



where A* k = n 1 / 2 (6* k - e k )/a. 
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5.3. Proof of Theorem 4.1. (i) By (4.9), the probability that \§j - 6j\ > 
C2(n~ 1 logn) 1 / 2 for some j = l,...,p, equals 0(pn~ Cl ) = o(l), where we 
used (4.7) to obtain the last identity. Therefore, by (4.6), (4.9) and (4.8), for 
each C > 0, the probability that 9j — 6 k > C(n~ l logn) 1 / 2 for all j = 1, . . . , jo 
and all k = jo + 1, . . . ,p, converges to 1 as n — > oo. From this result and the 
first part of (4.9), it follows that for 1 < j < jo 

k:k=ij k:k<j ,k=ij 

where the random variables Wi, . . . , Wj are asymptotically independent and 
distributed as normal iV(0, 1), and P(Aj = 0) — > 1 as n — > oo. 

(ii) In the bootstrap case, it follows from the second formula in (4.3) that 

(5.3) f* = i+ ]T i{m l i 2 {e*-d ] ) + /\ ]k <m l i 2 {ei-e k )} + /\*, 

k:k<j ,k^j 

where, if n is so large that infi<j<jo m ^jo<k<p{0j — Ok) > ^C 2 {m~ x logm,) 1 / 2 , 
then 

(5.4) sup \A jk \ < 2m 1/2 ( sup \6j-0j\ + sup \6 h - 6 j2 \) -> 0, 

i<fc<io i<j<io i<j'i,j2<io 

P(A*^0)<p sup [P{\§ k -e k \>C 2 (m- 1 logm) 1 / 2 } 

Kk<p 

(5.5) 

+ P{\6% -6 k \> C 2 (m~ l logm) 1 / 2 }] -> 0. 

The convergence in (5.4) is in probability and is a consequence of (4.9), (4.10) 
and the fact that m/n — > 0, and (5.5) follows from (4.7) and the second parts 
of (4.9) and (4.12). Part (ii) of Theorem 4.1 follows from (5.3)-(5.5). 

(iii) Note that 

E(f*\x,x*)-i = J2 P0*<e* k \x,x*) = st + (s 2 + s 3 +st)n, 

k:k+j 

where P(0 < O < 1) = 1, 

30 

si = Y,no*<oi\x,x*), 

k=l 

oo 

S 2 = I{e 3 -e k <AC 2 {rrr x \ogm) 1 ' 2 }, 

fc=jo+l 

5 3 = E I{\e k - e k \ > ^(m-ilogm) 1 / 2 }, 
k=l 

SI = 1 £ p {\0*k - h\ > C 2 (m- 1 logm) 1 / 2 \X,X*}. 
k=l 
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In view of (4.10) and (4.11), S 2 = for all sufficiently large n; by (4.7) and 
the second part of (4.9), E(Ss) = o(l); and by (4.7) and the second part of 
(4.12), E(St) = o(l). Therefore, E(S 2 + S 3 + 5|) = o(l); call this result (R). 
Since, using the independent-component bootstrap, XJ and X k (for k ^ j) 
are independent conditional on X; and since 

P(9* < §* k \X, X*) = P{m}/ 2 0* - 0j) + m 1/2 (^ - 6 k ) < m x l 2 (d\ - 6 k )\X}; 

then it follows from (4.6), the first parts of (4.9) and (4.12), and Kol- 
mogorov's extension theorem, that the joint distribution function of P(6j < 

k \X, X*), for 1 < k < jo and k 7^ j (and conditional on X), minus the joint 
distribution function of P(Zj(jj < Z k ak\Zj) for 1 < k < jo and k 7^ j (for 
independent standard normal random variables Z\~ defined on an enlarged 
probability space), converges to zero in probability in any integral metric 
on a compact set. Therefore, the distribution function of S\ + 1, conditional 
on X, minus the distribution of E(Rj\Zj), converges in probability to zero. 
[Here, Rj is the function of Z\, . . . , Zj defined at (4.13), and the construction 
of Z± , . . . , Zj involves them being measurable in the sigma- field generated 
by X U X?.] This property, and result (R), together imply (4.14). 

To derive the final portion of part (iii) of Theorem 4.1, note that the 
argument leading to (4.14) implies that 

e\ £ p(§*<e* k \x,x*)\=o(i). 

U=j +1 > 

Therefore, 

E{rf\X)= Y, E{P0* <§* kl \X,X*)P0* <e* k2 \x,x*)\x} 

J2 E{p(§*<e* kl \x,x;) 

ki,k 2 : fci,fc 2 ^j,l<fci,fc2<jo 



xP(6*<6T\X,X*)\X} + o p (l) 



= T 2 + o p (l), 
where, for £ = 1,2, 



T f = E 



{ E P(e*<e* k \x,x*)\ 

l k:k^j,l<k<j 



X 



+ Op(l). 



More simply, E(r*AX) = T\ + o p (l). The argument in the previous paragraph 
can be used to show that T\ and T 2 converge in probability to E{E(Rj — 
l\Zj) 2 } and E(Rj - 1), respectively. Since E{E(R j -l\Zj) 2 } = E{(Rj - 1) 2 }, 
then vaz(r*\X) converges in probability to var(Rj), as had to be proved. 
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